Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)

Higher Education Students Performance Evaluation¶

  • The data was collected from Turkish students at two faculties: Faculty of Engineering and Faculty of Educational Sciences students in 2019.

  • The goal is to create an ML model that can predict student performance given the data taken from a survey.

  • The grades are in categorical –AA, BA, BB, CB, CC, DC, DD, and Fail– hence it should be model as a mutli-class classification.


Data Set Information

The data contains results from a survey with columns 1-10 relate to personal questions, 11-16 are family related, and the remaining questions include education habits.

1. Exploratory Data Analysis¶

The outcome data –the grades– shows an imbalanced distribution. Whilst DD has 25% of the data, BA and CB have less than 10% and Fail represents only 5.5% of the whole data –only eight points–. This eventually will present a problem as the model will have few data points to train on predicting the Fail grade, but more data points to train the model on predicting the DD grade.

2. Data Preparation¶

Error, Data Transformation, K-Fold and Metrics

Precision and recall provide insights into the model's performance for each class individually, while accuracy gives an overall view of the model's correctness. Since this is a multi-class classification problem, precision and recall are calculated individually for each class and then averaged.

Precision: measures the proportion of correctly predicted grades out of all grades predicted as a specific grade. In this case, when predicting an AA grade what proportion of all predicted AA grades where truly AA grades. The procedure is repeated for each individual grade. High precision indicates that the model is good at correctly identifying a specific grade without misclassifying with the other grades. However, it doesn't consider the case when a grade was not predicted as the real grade.

Recall: measures the proportion of correctly predicted grades out of all actual grades in the set. In this case, when predicting an AA grade what proportion of all AA grades were predicted as AA grades. The procedure is repeated for each individual grade. High recall indicates that the models good at predicting most of the grades from each category to its real category.

Accuracy: measures the overall correctness of the model's predictions across all grades. It calculates the proportion of correctly predicted grades out of the total number of grades. It provides an overall assessment of the model's performance, considering both correct predictions for identifying the real and false grade category. However, it may not be the most informative metric when dealing with imbalanced datasets, where the number of instances in each class varies significantly.

Train and Test Subsets

Since the data is imbalanced when splitting to the train and test sets the imbalance has to taken into account. The even split is needed so that the data can train using all possible outcomes – with a distribution comparable to the expected in none seen data.

3. Classification Study¶

The following list shows the estimators –and their parameters– that are studied to identify the best possible model:

log = LogisticRegression(penalty=None, random_state=6064, solver='saga', max_iter=7500, multi_class='multinomial', n_jobs=-1)
l1 = LogisticRegression(penalty='l1', random_state=6064, solver='saga', max_iter=7500, multi_class='multinomial', n_jobs=-1)
l2 = LogisticRegression(penalty='l2', random_state=6064, solver='sag', max_iter=10500, multi_class='multinomial', n_jobs=-1)
net = LogisticRegression(penalty='elasticnet', random_state=6064, solver='saga', max_iter=10500, multi_class='multinomial', n_jobs=-1, l1_ratio=0.5)
sgd = SGDClassifier(loss='modified_huber', penalty=None, max_iter=7500, n_jobs=-1, random_state=6064)
mlp = MLPClassifier(solver='adam', max_iter=4500, random_state=6064)
dtc = DecisionTreeClassifier(random_state=6064)
rfc = RandomForestClassifier(random_state=6064, n_jobs=1)
etc = ExtraTreeClassifier(random_state=6064)
ets = ExtraTreesClassifier(random_state=6064, n_jobs=1)
abc = AdaBoostClassifier(random_state=6064)
gpc = GaussianProcessClassifier(kernel=RBF(0.05), random_state=6064, n_jobs=1)
gbc = GradientBoostingClassifier(loss='log_loss', random_state=6064)
svc = SVC(kernel=RBF(), probability=True)

3.1 Standard Estimators¶

Model Accuracy Recall weighted Precision weighted AUC
logisticregression 0.250000 0.229200 0.188900 0.625700
logisticregression_l1 0.333300 0.291700 0.223600 0.714700
logisticregression_l2 0.250000 0.194400 0.156500 0.644300
logisticregression_elasticnet 0.305600 0.263900 0.210800 0.683800
sgd 0.277800 0.250000 0.193500 0.551800
mlp 0.361100 0.312500 0.241300 0.635900
decisiontree 0.166700 0.131900 0.103500 0.503500
randomforest 0.416700 0.347200 0.291700 0.695700
extratree 0.138900 0.152800 0.121500 0.512600
adaboost 0.250000 0.159700 0.084600 0.607700
extratrees 0.416700 0.375000 0.343100 0.723000
gaussianprocess 0.083300 0.125000 0.010400 0.500000
gradientboosting 0.388900 0.395800 0.309000 0.661400
svc 0.250000 0.125000 0.031200 0.326000

3.2 Standard Estimators & Feature Selection with Variance Threshold of 0.10¶

Model Accuracy Recall weighted Precision weighted AUC
logisticregression 0.277800 0.243100 0.214600 0.632400
logisticregression_l1 0.333300 0.284700 0.229900 0.716900
logisticregression_l2 0.277800 0.222200 0.199300 0.637900
logisticregression_elasticnet 0.277800 0.201400 0.165300 0.673600
sgd 0.250000 0.180600 0.194400 0.572400
mlp 0.250000 0.250000 0.156200 0.600300
decisiontree 0.250000 0.284700 0.211800 0.586200
randomforest 0.250000 0.222200 0.160100 0.660200
extratree 0.194400 0.138900 0.116000 0.510100
adaboost 0.194400 0.201400 0.168500 0.605000
extratrees 0.250000 0.208300 0.194400 0.706400
gaussianprocess 0.083300 0.125000 0.010400 0.500000
gradientboosting 0.305600 0.291700 0.215300 0.661900
svc 0.250000 0.125000 0.031200 0.333800

3.3 Standard Estimators & Feature Selection with Variance Threshold of 0.20¶

Model Accuracy Recall weighted Precision weighted AUC
logisticregression 0.416700 0.354200 0.354200 0.700400
logisticregression_l1 0.305600 0.250000 0.219200 0.677200
logisticregression_l2 0.277800 0.277800 0.213200 0.675700
logisticregression_elasticnet 0.250000 0.215300 0.173600 0.703400
sgd 0.222200 0.194400 0.135400 0.582000
mlp 0.333300 0.298600 0.183300 0.651500
decisiontree 0.250000 0.180600 0.152800 0.534000
randomforest 0.277800 0.250000 0.163200 0.674900
extratree 0.305600 0.284700 0.181900 0.591800
adaboost 0.222200 0.194400 0.140300 0.737700
extratrees 0.305600 0.243100 0.181900 0.741300
gaussianprocess 0.083300 0.125000 0.010400 0.500000
gradientboosting 0.250000 0.236100 0.148600 0.649300
svc 0.250000 0.125000 0.031200 0.364700

3.4. Grid Search for Best Estimators¶

Model Accuracy Recall weighted Precision weighted AUC
logisticregression 0.416700 0.354200 0.354200 0.700400
logisticregression_l1 0.444400 0.375000 0.361100 0.685400
logisticregression_l2 0.388900 0.319400 0.267400 0.699700
logisticregression_elasticnet 0.444400 0.375000 0.361100 0.691400
sgd_l1 0.444400 0.340300 0.229700 0.760700
mlp 0.416700 0.395800 0.276400 0.678200
decisiontree 0.444400 0.409700 0.315300 0.671300
randomforest 0.472200 0.395800 0.313900 0.710800
extratree 0.444400 0.409700 0.291700 0.723300
adaboost 0.472200 0.444400 0.308300 0.777200
extratrees 0.555600 0.513900 0.465800 0.744000
gaussianprocess 0.083300 0.125000 0.010400 0.500000
gradientboosting 0.388900 0.312500 0.152200 0.626500
svc 0.444400 0.333300 0.236100 0.633700

3.5 Comparison Between Procedures¶

4. Test Set Performance¶

Model Accuracy Recall weighted Precision weighted AUC
logisticregression 0.310300 0.310300 0.316300 0.721700
logisticregression_l1 0.275900 0.275900 0.422400 0.734400
logisticregression_l2 0.344800 0.344800 0.405400 0.720700
logisticregression_elasticnet 0.344800 0.344800 0.405400 0.737800
sgd_l1 0.206900 0.206900 0.115000 0.628700
mlp 0.137900 0.137900 0.133600 0.627800
decisiontree 0.206900 0.206900 0.181000 0.486100
randomforest 0.310300 0.310300 0.250000 0.715900
extratree 0.172400 0.172400 0.232200 0.566100
adaboost 0.172400 0.172400 0.137900 0.538100
extratrees 0.172400 0.172400 0.187700 0.538900
gaussianprocess 0.069000 0.069000 0.004800 0.500000
gradientboosting 0.275900 0.275900 0.171300 0.605500
svc 0.206900 0.206900 0.298300 0.565500

5 Best Classifiers for Grades Prediction¶

5.1.a Best Logistic Regression –elasticnet– Classifier¶


Overall Test Performance Report

		 TRAIN 	 TEST

Accuracy: 	 0.929 	 0.345
Recall: 	 0.929 	 0.345
Precision: 	 0.931 	 0.405

AUC: 		 0.995 	 0.721


Test Set Classification Report

              precision    recall  f1-score   support

          AA       1.00      0.33      0.50         3
          BA       1.00      0.67      0.80         3
          BB       0.67      0.67      0.67         3
          CB       0.00      0.00      0.00         2
          CC       0.27      0.75      0.40         4
          DC       0.33      0.20      0.25         5
          DD       0.14      0.14      0.14         7
        Fail       0.00      0.00      0.00         2

    accuracy                           0.34        29
   macro avg       0.43      0.34      0.34        29
weighted avg       0.41      0.34      0.34        29

5.1.b Best Random Forest Classifier¶


Overall Test Performance Report

		 TRAIN 	 TEST

Accuracy: 	 1.000 	 0.310
Recall: 	 1.000 	 0.310
Precision: 	 1.000 	 0.250

AUC: 		 1.000 	 0.716


Test Set Classification Report

              precision    recall  f1-score   support

          AA       1.00      0.67      0.80         3
          BA       0.00      0.00      0.00         3
          BB       0.00      0.00      0.00         3
          CB       0.00      0.00      0.00         2
          CC       0.17      0.25      0.20         4
          DC       0.25      0.20      0.22         5
          DD       0.33      0.71      0.45         7
        Fail       0.00      0.00      0.00         2

    accuracy                           0.31        29
   macro avg       0.22      0.23      0.21        29
weighted avg       0.25      0.31      0.26        29

CONCLUSION¶

In conclusion, the evaluation of various models reveals their performance on the classification task. The results demonstrate the impact of feature selection and hyperparameter optimization on model performance. The best-performing model, the Logistic Regression with l1 penalization, shows promising results in terms of accuracy, recall, precision, and AUC in comparison to the other classifiers.

Nonetheless, the performance of such model is still poor – given the fact that the tunning process is made for just one model applied to each grade leaving the rest out. If a model per grade is developed and fine-tuned better classification performances can be achieved.